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Abstract 

This work studies the maximum possible sign rank of x sign matrices with a given 
VC dimension d. For d = 1, this maximum is three. For d = 2, this maximum is 0(A^^/^). 
For d > 2, similar but slightly less accurate statements hold. The lower bounds improve 
over previous ones by Ben-David et al., and the upper bounds are novel. 

The lower bounds are obtained by probabilistic constructions, using a theorem of War¬ 
ren in real algebraic topology. The upper bounds are obtained using a result of Welzl about 
spanning trees with low stabbing number, and using the moment curve. 

The upper bound technique is also used to: (i) provide estimates on the number of 
classes of a given VC dimension, and the number of maximum classes of a given VC 
dimension - answering a question of Frankl from ’ 89, and (ii) design an efficient algorithm 
that provides an 0(A^/ log(A^)) multiplicative approximation for the sign rank. 

We also observe a general connection between sign rank and spectral gaps which is 
based on Forster’s argument. Consider the N x N adjacency matrix of a A regular graph 
with a second eigenvalue of absolute value A and A < N/2. We show that the sign rank 
of the signed version of this matrix is at least A/A. We use this connection to prove the 
existence of a maximum class C C {±1}^ with VC dimension 2 and sign rank 0(A^/^). 
This answers a question of Ben-David et al. regarding the sign rank of large VC classes. 
We also describe limitations of this approach, in the spirit of the Alon-Boppana theorem. 

We further describe connections to communication complexity, geometry, learning the¬ 
ory, and combinatorics. 
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1 Introduction 


Boolean matrices (with 0,1 entries) and sign matrices (with ±1 entries) naturally appear in 
many areas of researchQ. We use them e.g. to represent set systems and graphs in combinatorics, 
hypothesis classes in learning theory, and boolean functions in communication complexity. 

This work further investigates the relation between two useful complexity measures on sign 
matrices. 


Definition (Sign rank). For a real matrix M with no zero entries, let sign{M) denote the sign 
matrix such that {sign{M))ij = sign{Mij) for all i,j. The sign rank of a sign matrix S is 
defined as 

sign-rank{S) = mm{rank{M) : sign{M) = S}, 


where the rank is over the real numbers. It captures the minimum dimension of a real space in 
which the matrix can be embedded using half spaces through the origin^(seefor example MZl/j- 


Definition (Vapnik-Chervonenkis dimension). The VC dimension of a sign matrix S, denoted 
VC{S), is defined as follows. A subset C of the columns of S is called shattered if each of the 
21'"I different patterns of ones and minus ones appears in some row in the restriction of S to the 
columns in C. The VC dimension of S is the maximum size of a shattered subset of columns. It 
captures the size of the minimum e-net for the underlying set system lUTl WTil . 


The VC dimension and the sign rank appear in various areas of computer science and mathe¬ 
matics. One important example is learning theory, where the VC dimension captures the sample 
complexity of learning in the PAC model [|T^[65l . and the sign rank relates to the generaliza¬ 
tion guarantees of practical learning algorithms, such as support vector machines, large margin 
classifiers, and kernel classifiers [|46l [311 [3^ [33l l22l [66]l . Loosely speaking, the VC dimen¬ 
sion relates to learnability, while sign rank relates to leamability by linear classifiers. Another 
example is communication complexity, where the sign rank is equivalent to the unbounded er¬ 
ror randomized communication complexity [[54ll . and the VC dimension relates to one round 
distributional communication complexity under product distributions W2\ . 

The main focus of this work is how large can the sign rank be for a given VC dimension. 
In learning theory, this question concerns the universality of linear classifiers. In communica¬ 
tion complexity, this concerns the difference between randomized communication complexity 
with unbounded error and between communication complexity under product distribution with 
bounded error. Previous works have studied these differences from the communication com¬ 
plexity perspective [f^l^ and the learning theory perspective [lT4l . In this work we provide 
explicit matrices and stronger separations compared to those of (Ml EH and (141 . See the 
discussions in Section [L2] and Section [T4l for more details. 

'There is a standard transformation of a boolean matrix B to the sign matrix S = 2B — J, where J is the all 1 
matrix. The matrix S is called the signed version of B, and the matrix B is called the boolean version of S. 

^That is, the columns correspond to points in and the rows to half spaces through the origin (i.e. collections 

of all points x G so that {x, v) > 0 for some fixed v € M^). 
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1.1 Duality 

We start by providing alternative deseriptions of the VC dimension and sign rank, whieh demon¬ 
strate that these notions are dual to eaeh other. The sign rank of a sign matrix S is the maximum 
number k sueh that 

V M sueh that sign(M) = S 3 k eolumns ji,..., jk 

the eolumns ji, ■ ■ ■, jk are linearly independent in M 

The dual sign rank of S is the maximum number k sueh that 

3 k eolumns ji,... ,jk V M sueh that sign(M) = S 

the eolumns ji, ... ,jk are linearly independent in M . 

It turns out that the dual sign rank is almost equivalent to the VC dimension (the proof is in 
Seetion lrTI) . 

Proposition 1. VC{S) < dual-sign-rank{S) < 2VC{S) + 1. 

As the dual sign rank is at most the sign rank, it follows that the VC dimension is at most 
the sign rank. This provides further motivation for studying the largest possible gap between 
sign rank and VC dimension; it is equivalent to the largest possible gap between the sign rank 
and the dual sign rank. 

It is worth noting that there are some interesting elasses of matriees for whieh these quan¬ 
tities are equal. One sueh example is the 2” x 2" disjointness matrix DISJ, whose rows and 
eolumns are indexed by all subsets of [n], and DISJx,y = 1 if and only if \x Dy] > 0. For this 
matrix both the sign rank and the dual sign rank are exaetly n + 1. 

1.2 Sign rank versus VC dimension 

The VC dimension is at most the sign rank. On the other hand, it is long known that the sign rank 
is not bounded from above by any funetion of the VC dimension. Alon, Haussler, and Welzl 0 
provided examples of x matriees with VC dimension 2 for whieh the sign rank tends to 
infinity with N. [|T4l used ideas from Q together with estimates eoneeming the Zarankiewiez 
problem to show that many matriees with eonstant VC dimension (at least 4) have high sign 
rank. 

We further investigate the problem of determining or estimating the maximum possible sign 
rank of A^ x A^ matriees with VC dimension d. Denote this maximum by f{N, d). We are 
mostly interested in fixed d and N tending to infinity. 

We observe that there is a diehotomy between the behaviour of f{N, d) when d = 1 and 
when d > 1. The value of /(A^, 1) is 3, but for d > 1, the value of f{N, d) tends to infinity with 
N . We now diseuss the behaviour of /(A^, d) in more detail, and deseribe our results. 

We start with the ease d = 1. The following theorem and elaim imply that for all A^ > 4, 

/(AT, 1) = 3. 
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The following theorem which was proved by |l6l shows that for d = 1, matrices with high 
sign rank do not exist. For completeness, we provide our simple and constructive proof in 
Section |3 .2. 1[ 

Theorem 2 (dH). If the VC dimension of a sign matrix M is one then its sign rank is at most 3. 

We also note that the bound 3 is tight (see Section 13.2.1 1 for a proof). 

Claim 3. For N > A, the N x N signed identity matrix (i.e. the matrix with 1 on the diagonal 
and —1 off the diagonal) has VC dimension one and sign rank 3. 

Next, we consider the case d > 1, starting with lower bounds on f{N, d). As mentioned 
above, two lower bounds were previously known: [[6!| showed that /(A^, 2) > kl{logN). lfT4l 
showed that f{N, d) > u{N ), for every fixed d, which provides a nontrivial result only 

for d > 4. We prove the following stronger lower bound. 

Theorem 4. The following lower bounds on f{N, d) hold: 

1. f{N,2) > VL{N^/‘^/\ogN). 

2. f{N,?,)>Vt{N^/^^/\ogN). 

3. /(iV,4) > n{N‘^/^/\ogN). 

4. For every fixed d > 4, 

f{N,d) > 

To understand part 4 better, notice that 

+ 5d + 2 _ 1 3d - 1 
d^ + 2d? + 3d d d^ + 2d^ + 3d’ 

which is close to 1/d for large d. The proofs are described in Section [T2l where we also discuss 
the tightness of our arguments. 

What about upper bounds on f{N^ d)? It is shown in lfT4l that for every matrix in a certain 
class of N X N matrices with constant VC dimension, the sign rank is at most The 

proof uses the connection between sign rank and communication complexity. However, there is 
no general upper bound for the sign rank of matrices of VC dimension d in [fT4ll . and the authors 
explicitly mention the absence of such a result. 

Here we prove the following upper bounds, using a concrete embedding of matrices with 
low VC dimension in real space. 

Theorem 5. For every fixed d > 2, 


f{N,d) < 


3 










In particular, this determines f{N,2) up to a logarithmic factor: 

Q{N^/y\ogN) < f{N,2) < 0{N^/^). 

The above results imply existence of sign matrices with high sign rank. However, their 
proofs use counting arguments and hence do not provide a method of certifying high sign rank 
for explicit matrices. In the next section we show how one can derive a lower bound for the sign 
rank of many explicit matrices. 

1.3 Sign rank and spectral gaps 

Spectral properties of boolean matrices are known to be deeply related to their combinatorial 
structure. Perhaps the best example is Cheeger’s inequality which relates spectral gaps to com¬ 
binatorial expansion (l^ lTlI^fniSSl. Here, we describe connections between spectral properties 
of boolean matrices and the sign rank of their signed versions. 

Proving strong lower bounds on the sign rank of sign matrices turned out to be a difficult 
task. Alon, Frankl, and Rodl O were the first to prove that there are sign matrices with high sign 
rank, but they have not provided explicit examples. Later on, a breakthrough of OOll showed 
how to prove lower bounds on the sign rank of explicit matrices, proving, specifically, that 
Hadamard matrices have high sign rank. [l55l proved that there is a function that is computed 
by a small depth three boolean circuit, but with high sign rank. It is worth mentioning that no 
explicit matrix whose sign rank is significantly larger than 2 is known. 

We focus on the case of regular matrices, but a similar discussion can be carried more 
generally. A boolean matrix is A regular if every row and every column in it has exactly A 
ones, and a sign matrix is A regular if its boolean version is A regular. 

An N X N real matrix M has N singular values cti > a 2 > ■ ■ ■ > ctj^ > 0. The largest 
singular value of M is also called its spectral norm ||M|| = ai = max{||Ma;|| : ||a;|| < 1}, 
where ||x|p = {x^x) with the standard inner product. If the ratio cr 2 (M)/||M|| is bounded away 
from one, or small, we say that M has a spectral gap. 

We prove that if B has a spectral gap then the sign rank of S is high. 

Theorem 6. Let B be a A regular N x N boolean matrix with A < N/2, and let S be its 
signed version. Then, 

A 

sign-rank{S) > ———. 

<^2\B) 

In many cases a spectral gap for B implies that it has pseudorandom properties. This the¬ 
orem is another manifestation of this phenomenon since random sign matrices have high sign 
rank (see [|5ll). 

The theorem above provides a non trivial lower bound on the sign rank of S. There is a non 
trivial upper bound as well. The sign rank of a A regular sign matrix is at most 2 A -f 1. Here 
is a brief explanation of this upper bound (see Q for a more detailed proof). Every row i in S' 
has at most 2A sign changes (i.e. columns j so that S'jj 7 ^ S'jj+i). This implies that for every 
i, there is a real univariate polynomial Gi of degree at most 2 A so that Gi{j)Sij > 0 for all 
j E [A] C M. To see how this corresponds to sign rank at most 2A -f 1, recall that evaluating 
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a polynomial G of degree 2A on a point x G M eorresponds to an inner produet over 
between the vector of coefficients of G, and the vector of powers of x. 

Our proof of Theorem!^ and its limitations are discussed in detail in Section [X3l 


2 Applications 

2.1 Learning theory 

Universality of linear classifiers 

Linear classifiers have been central in the study of machine learning since the introduction of the 
Perceptron algorithm in the 50’s [|57l and Support Vector Machines (SVM) in the 90’s [|20ll^ . 
The rising of kernel methods in the 90’s [l20l IMTl enabled reducing many learning problems to 
the framework of halfspaces, making linear classifiers a central algorithmic tool. 

These methods use the following two-step approach. First, embed the hypothesis clas^ in 
halfspaces of an Euclidean space (each point corresponds to a vector and for every hypothesis 
h, the vectors corresponding to and the vectors corresponding to are separated 

by a hyperplane). Second, apply a learning algorithm for halfspaces. 

If the embedding is to a low dimensional space then a good generalization rate is im¬ 
plied. For embeddings to large dimensional spaces, SVM theory offers an alternative parameter, 
namely the margiuB Indeed, a large margin also implies a good generalization rate. On the other 
hand, any embedding with a large margin can be projected to a low dimensional space using 
standard dimension reduction arguments [IWlfTTHTdll . 

Ben-David, Eiron, and Simon [[T4l utilized it to argue that “... any universal learning ma¬ 
chine, which transforms data to a Euclidean space and then applies linear (or large margin) 
classification, cannot preserve good generalization bounds in general.” Formally, they showed 
that: For any fixed d > 1, most hypothesis classes G C {± 1 }^ of VC dimension d have 
sign-rank of . As discussed in Section [L^ Theorem 0] quantitatively improves over their 
results. 

In practice, linear classifiers are widely used in a variety of applications including handwrit¬ 
ing recognition, image classification, medical science, bioinformatics, and more. The practical 
usefulness of linear classifiers and the argument of Ben-David, Eiron, and Simon manifest a 
gap between practice and theory that seems worth studying. We next discuss how Theorem [5j 
which provides a non-trivial upper bound on the sign rank, can be interpreted as a theoretical 
evidence which supports the practical usefulness of linear classifiers. Eet G C {± 1 }^ be a 
hypothesis class, and let 7 > 0. We say that G is 'y-weakly represented by halfspaces if for 
every finite Y C X, the sign rank of G\y is at most 0(| V|^“'^). In other words, there exists an 
embedding of Y in 'Mf with k = 0{\Y\^~'^) such that each hypothesis in Cly corresponds to 
a halfspace in the embedding. Theorem |5] shows that any class G is 7- weakly represented by 
halfspaces where 7 depends only on its VC dimension. Weak representations can be thought 
of as providing a compressed representation of C|y using half-spaces in a dimension that is 

^In this context we use the more common term “hypothesis class” instead of “matrix.” 

'*The margin of the embedding is the minimum over all hypotheses h of the distance between the convex hull 
of the vectors cotTesponding to h~^{l) and the convex hull of the vectors cotTesponding to 1) 
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Figure 1: An arrangement of lines in the plane and the eorresponding eells. 

sublinear in |F|. Sueh representations imply leamability; indeed, every 7 -weakly represented 
elass C is learnable, as the VC dimension of C is bounded from above by some funetion of 
of 7 . While these quantitative relations between the VC dimension and 7 may be rather loose, 
they show that in prineiple, any learnable elass has a weak representation by halfspaees whieh 
eertifies its leamability. 

Maximum classes with large sign rank 

Let C C {±1}^ be a elass with VC dimension d. The elass C is ealled maximum if it meets the 
Sauer-Shelah’s bound [[ 6 OI with equalitjU. That is, \C\ = (T)- Maximum elasses were 

studied in different eontexts sueh as maehine learning, geometry, and eombinatories (e.g. [|T^ 

[2a[35ll2l[l0l@4l[5D[58l[59l). 

There are several known examples of maximum elasses. A fairly simple one is the hamming 
ball of radius d, i.e., the elass of all veetors with weight at most d. Another set of examples 
relates to the sign rank: Let H an arrangement of hyperplanes in R'^. These hyperplanes eut R'^ 
into eells; the eonnected eomponents of R'^ \ (IJ/ieir • Eaeh eell c is assoeiated with a sign 
vector Vc G {± 1 }^ which describes the location of the cell relative to each of the hyperplanes. 
See Figure [2T] for a planar arrangement. The sign rank of such a class is at most d + 1. It is 
known (see e.g. [ISSil l that if the hyperplanes are in general position then the sign vectors of the 
cells form a maximum class of VC dimension d. 

Gartner and Welzl If^ gave a combinatorial characterization of maximum classes con¬ 
structed using generic halfspaees. As an application of their characterization they note that 
hamming ball of radius d is a maximum class that can not be realized this way. By Lemma [T9l 
however, the hamming ball of radius d has sign rank at most 2d+l (it is in fact exactly 2d + 1 ). 
It is therefore natural to ask whether every maximum class has sign rank which depends only 
on d. A similar question was also asked by IIT4l . Theorem [ 8 ] in Section [2.2.II gives a negative 
answer to this question, even when d = 2 (when d = 1, by Theorem [2| the sign rank is at most 
3). 

^Maximum classes are distinguished from maximal classes: A maximum class has the largest possible size 
among all classes of VC dimension d, and a maximal class is such that for every sign vector v ^ C, if v is added 
to C then the VC dimension is increased. 
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In machine learning, maximum elasses were studied extensively in the eontext of sample 
eompression schemes. A partial list of works in this eontext includes [|^ 144115^ 15^ [5^1^ . 
[[58l eonstrueted an unlabeled sample compression scheme for maximum classes. Their scheme 
uses an approaeh suggested by Il44l and their analysis resolved a eonjeeture from ll44l . A 
erueial part in their work is establishing the existenee of an embedding of any maximum elass 
of VC dimension d in an arrangement of pieeewise-linear hyperplanes in Theorem!^ below 

shows that even for VC dimension 2, there are maximum elasses C C {± 1 }^ of sign rank 
log A^). Thus, in order to make the pieeewise-linear arrangement in R^ linear the 
dimension of the space must significantly grow to log N). 


2.2 Explicit examples 

The spectral lower bound on sign rank gives many explieit examples of matrices with high sign 
rank, whieh eome from known eonstructions of expander graphs and eombinatorial designs. A 
rather simple such family of examples is finite projective geometries. 

Let d > 2 and n > 3. Let P be the set of points in a d dimensional projeetive spaee of order 
n, and let H be the set of hyperplanes in the spaee. For d = 2, this is just a projeetive plane 
with points and lines. It is known (see, e.g., ifT^ i that 

_ 1 

\P\ = \H\ = Nnd-= + + ... + n + 1 =- . 

’ n — 1 

Let A E {±1}^^^ be the signed point-hyperplane ineidenee matrix: 


Ap^h 


1 P & h, 
— 1 p ^ h. 


Theorem 7. The matrix A is N x N with N = Nn,d, its VC dimension is d, and its sign rank is 
larger than 


(^n — 1) 

The theorem follows from known properties of projective spaees (see Seetion 13.4.11) . A 
slightly weaker (but asymptotieally equivalent) lower bound on the sign rank of A was given 

by [HI. 

The sign rank of A is at most 2Nn,d-i -f 1 = 0{N^~d), due to the observation in ||5l 
mentioned above. To see this, note that every point in the projeetive spaee is ineident to Nn,d-i 
hyperplanes. 

Other explieit examples eome from speetral graph theory. Here is a brief deseription of 
matriees that are even more restrieted than having VC dimension 2 but have high sign rank; no 
3 eolumns in them have more than 6 distinet projeetions. An {N, A, A)-graph is a A regular 
graph on N vertiees so that the absolute value of every eigenvalue of the graph besides the 
top one is at most A. There are several known eonstructions of {N, A, A)-graphs for which 
A < 0{\/A), that do not contain short cycles. Any sueh graph with A > provides an 

example with sign rank at least and if there is no cyele of length at most 6 then in the 

sign matrix we have at most 6 distinet projeetions on any set of 3 eolumns. 
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2.2.1 Maximum classes 

Let P be the set of points in a projective plane of order n and let L be the set of lines in it. Let 
N = Nn ^2 = |.P| = \L\- For each line £ G L, fix some linear order on the points in 1. A set 
T C P is called an interval if T C £ for some line i G L, and T forms an interval with respect 
to the order we fixed on 1. 

Theorem 8. The class R of all intervals is a maximum class of VC dimension 2. Moreover, 
there exists a choice of linear orders for the lines in L such that the resulting R has sign rank 
log N). 

The proof of Theorem!^ is given in Section [3.4. 1[ The proof does not follow directly from 
Theorem m since it is not clear that the classes with VC dimension 2 and large sign rank which 
are guaranteed to exist by Theorem |4] can be extended to a maximum class. 

2.3 Computing the sign rank 

Linear Programming (LP) is one of the most famous and useful problems in the class P. As a 
decision problem, an LP problem concerns determining the satisfiability of a system 

ifx) >0, z = 1 ,..., m 

where each it is an affine function defined over (say with integer coefficients). A natural 
extension of LP is to consider the case in which each ii is a multivariate polynomial. Perhaps 
not surprisingly, this problem is much harder than LP. In fact, satisfiability of a system of poly¬ 
nomial inequalities is known to be a complete problem for the class 3R. The class 3R is known 
to lie between PSPACE and NP (see ll4^ and references within). 

Consider the problem of deciding whether the sign rank of a given N x N sign matrix is at 
most k. A simple reduction shows that to solve this problem it is enough to decide whether a 
system of real polynomial inequalities is satisfiable. Thus, this problem belongs to the class 3R. 
nsE and [fTTll showed that deciding if the sign rank is at most 3 is NP-hard, and that deciding 
if the sign rank is at most 2 is in P. Both lfT3]l . and IfTTll established the NP-hardness of deciding 
whether the sign-rank is at most 3 by a reduction from the problem of determining stretchacility 
of pseudo-line arrangements. This problem concerns whether a given combinatorial description 
of an arrangement of pseudo-lines can be realized (“stretched”) by an arrangement of lines. [l48ll . 
based on the works of tfSOll . Il64ll . and [[5^ showed that determining stretchability of pseudo-line 
arrangements is in fact 3R-complete. Therefore, it follow^ that determining whether the sign- 
rank is at most 3 is 3R-complete. 

Another related work of ||45ll concerns the problem of computing the approximate rank of 
a sign matrix, for which they provide an approximation algorithm. They pose the problem of 
efficiently approximating the sign rank as an open problem. 

^Interestingly, their motivation for considering sign rank comes from image processing. 

^ Il48ll considers a different type of combinatorial description than ifTSlfTTll . and therefore considered a different 
formulation of the stretchability problem. However, it is possible to transform between these descriptions in 
polynomial time. 
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Using an idea similar to the one in the proof of Theorem [5] we derive an approximation 
algorithm for the sign rank (see Section [3.4.2l) . 

Theorem 9. There exists a polynomial time algorithm that approximates the sign rank of a 
given N by N matrix up to a multiplicative factor ofc ■ N/ log(iV) where c > 0 a universal 
constant. 

2.4 Communication complexity 

We briefly explain the notions from communication complexity we use. For formal definitions, 
background and more details, see the textbook [|43l . 

For a function / and a distribution p on its inputs, define D^{f) as the minimum commu¬ 
nication complexity of a protocol that correctly computes / with error 1/3 over inputs from 
/i. Define Zl^(/) = max{Zl^(/) : /i is a product distribution}. Define the unbounded error 
communication complexity U{f) of / as the minimum communication complexity of a ran¬ 
domized private-coiiu protocol that correctly computes / with probability strictly larger than 
1/2 on every input. 

Two works of [|^ [62l showed that there are functions with small distributional communi¬ 
cation complexity under product distributions, and large unbounded error communication com¬ 
plexity. In [l63ll the separation is as strong as possible but it is not for an explicit function, and 
the separation in [l62l is not as strong but the underlying function is explicit. 

The matrix A with d = 2 and n > 3 in our example from Section 12.21 corresponds to the 
following communication problem: Alice gets a point p E P, Bob gets a line i G L, and they 
wish to decide whether p G £ or not. Let / :PxL—)-{0,l}be the corresponding function and 
let m = [log 2 (A^)]. A trivial protocol would be that Alice sends Bob using m bits the name of 
her point. Bob checks whether it is incident to the line, and outputs accordingly. 

Theorem |7] implies the following consequences. Even if we consider protocols that use 
randomness and are allowed to err with probability less than but arbitrarily close to then 
still one cannot do considerably better than the above trivial protocol. However, if the input 
{p,i) G P X L is distributed according to a product distribution then there exists an 0(1) 
protocol that errs with probability at most |. 

Corollary 10. The unbounded error communication complexity of f i^U{f) > ^ — 0(1). The 
distributional communication complexity of f under product distributions is D^{f) < 0(1). 

These two seemingly contradicting facts are a corollary of the high sign rank and the low 
VC dimension of A, using two known results. The upper bound on D^{f) follows from the fact 
that VCdim(A) = 2, and the work of [|4^ which used the PAC learning algorithm to construct 
an efficient (one round) communication protocol for / under product distributions. The lower 
bound on U{f) follows from that sign-rank(A) > f2(A^^/^), and the result of [[54l that showed 
that unbounded error communication complexity is equivalent to the logarithm of the sign rank. 
See [l63l for more details. 

*In the public-coin model, every boolean function has unbounded communication complexity at most two. 

®By taking larger values of d, the constant | may be increased to ^ 
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2.5 Counting VC classes 

Let c{N,d) denote the number of elasses C C {±1}^ with VC dimension d. We give the 
following estimate of c{N, d) for constant d and N large enough. The proof is given in Sec¬ 
tion |3A3l 

Theorem 11. For every d > 0, there is Nq = No{d) such that for all N > Nq: 

Let m{N^ d) denote the number of maximum classes C C {±1}^ of VC dimension d. The 
problem of estimating m(iV, d) was proposed by lf34l . We provide the following estimate (see 
Section [3.4.3l) . 

Theorem 12. For every d> 1, there is Nq = NQ^d) such that for all N > Nq: 

jV(i+o(i))HTT(d) < m{N,d) < Ar(i+°(L)Eti 

The gap between our upper and lower bound is roughly a multiplicative factor of d + 1 in 
the exponent. In the previous bounds given by ll34l the gap was a multiplicative factor of N in 
the exponent. 

2.6 Counting graphs 

Here we describe an application of our method for proving Theorem [5] to counting graphs with 
a given forbidden substructure. 

Let G = (V, E) be a graph (not necessarily bipartite). The universal graph U{d) is defined 
as the bipartite graph with two color classes A and B = 2^ where |y4| = d, and the edges are 
defined as {a, 6} iff a G b. The graph G is called U ((i)-free if for all two disjoint sets of vertices 
A, B c V so that \A\ = d and \B\ =2'^, the bipartite graph consisting of all edges of G between 
A and B is not isomorphic to U{d). In Theorem 24 of 0|, which improves Theorem 2 there, it 
is proved that for d> 2, the number of U{d + l)-free graphs on N vertices is at most 

20(Af2-l/<i(log7V)‘*+2) 


The proof in Jll is quite involved, consisting of several technical and complicated steps. Our 
methods give a different, quick proof of an improved estimate, replacing the (log NY'^'^ term 
by a single log N term. 

Theorem 13. For every fixed d > 1, the number ofU{d+ l)-free graphs on N vertices is at 
The proof of the theorem is given in Section [3.4.4[ 

2.7 Geometry 

Differences and similarities between finite geometries and real geometry are well known. An 
example of a related problem is finding the minimum dimension of Euclidean space in which 
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we can embed a given finite plane (i.e. a collection of points and lines satisfying certain axioms). 
By embed we mean that there are two one-to-one maps ep, so that ep{p) e ep (f) iff p G f for 
all p G P, f G L. The Sylvester-Gallai theorem shows, for example, that Fano’s plane cannot be 
embedded in any finite dimensional real space if points are mapped to points and lines to lines. 

How about a less restrictive meaning of embedding? One option is to allow embedding 
using half spaces, that is, an embedding in which points are mapped to points but lines are 
mapped to half spaces. Such embedding is always possible if the dimension is high enough: 
Every plane with point set P and line set L can be embedded in by choosing ep(p) as the 
p’th unit vector, and ep(f) as the half space with positive projection on the vector with 1 on 
points in i and —1 on points outside i. The minimum dimension for which such an embedding 
exists is captured by the sign rank of the underlying incidence matrix (up to a ±1). 

Corollary 14. A finite projective plane of order n > 3 cannot be embedded in using half 
spaces, unless k > — 1 with N = vf A n + 1. 

Roughly speaking, the corollary says that there are no efficient ways to embed finite planes 
in real space using half spaces. 


3 Proofs 

3.1 Duality 

Here we discuss the connection between VC dimension and dual sign rank. 

We start with an equivalent definition of dual sign rank, that is based on the following 
notion. We say that a set of columns C is antipodally shattered in a sign matrix S if for each 
V G {±1}*", either v or —v appear as a row in the restriction of S to the columns in C. 

Claim 15. The set of columns C is antipodally shattered in S if and only if in every matrix M 
with sign{M) = S the columns in C are linearly independent. 

Proof First, assume C is such that there exists some M with sign(M) = S' in which the 
columns in C are linearly dependent. For a column j G C, denote by M{j) the j’th column in 
M. Fet {aj : j G C} be a set of real numbers so that ^ 

zero. Consider the vector v G {±1}*" such that Vj = 1 if > 0 and Vj = —1 if aj < 0. The 
restriction of S' to C does not contain v nor —v as a row, which certifies that C is not antipodally 
shattered by S'. 

Second, let C be a set of columns which is not antipodally shattered in S'. Fet v G {±l}‘"be 
such that both v,—v do not appear as a row in the restriction of S' to C. Consider the subspace 
U = {u e R*^ ; ^ 0}- vector s G {±1}*^ so that s ±v, the space 

U contains some vector Ug such that sign(Ms) = s. Fet M be so that sign(M) = S' and in 
addition for each row in S' that has pattern s G {±}‘"inS' restricted to C, the corresponding 
row in M restricted to C is G U. All rows in M restricted to C are in U, and therefore the 
set {M(j) : j G C} is linearly dependent. □ 

Corollary 16. The dual sign rank of S is the maximum size of a set of columns that are antipo¬ 
dally shattered in S'. 
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Now, we prove Proposition [B 

VC{S) < dual-sign-rank(S') < 2VC{S) + 1. 

The left inequality: The VC dimension of S is at most the maximum size of a set of eolumns 
that is antipodally shattered in S, whieh by the above elaim equals the dual sign rank of S. 

The right inequality: Let C be a largest set of eolumns that is antipodally shattered in S. 
By the elaim above, the dual sign rank of S is \C\. Let A C such that l^l = L|C|/2J. If 
A is shattered in S then we are done. Otherwise, there exists some v e {±1}"^ that does not 
appear in S restricted to A. Since C is antipodally shattered by S', this implies that S' contains 
all patterns in {±1}*" whose restriction to A is —v. In particular, S shatters C \ A which is of 
size at least L|C|/ 2 J. 

3.2 Sign rank versus VC dimension 

In this section we study the maximum possible sign rank of N x N matrices with VC dimen¬ 
sion d, presenting the proofs of Proposition [T] and Theorems [5] and HI We also show that the 
arguments supply a new, short proof and an improved estimate for a problem in asymptotic 
enumeration of graphs studied by flUl. 

3.2.1 VC dimension one 

Our goal in this section is to show that sign matrices with VC dimension one have sign rank at 
most 3, and that 3 is tight. Before reading this section, it may be a nice exercise to prove that 
the sign rank of the N x N signed identity matrix is exactly three (for iV > 4). 

Let us start by recalling a geometric interpretation of sign rank. Let M hy an R x C sign 
matrix. A d-dimensional embedding of M using half spaces consists of two maps eji, ec so that 
for every row r G [i?] and column c G [C], we have that eR{r) G ec{c) is a half space in 
and Mr,c = 1 iff ej?(r) G ec(c). The important property for us is that if M has a d-dimensional 
embedding using half spaces then its sign rank is at most d + 1. The +1 comes from the fact 
that the hyperplanes defining the half spaces do not necessarily pass through the origin. 

Our goal in this section is to embed M with VC dimension one in the plane using half 
spaces. The embedding is constructive and uses the following known claim (see, e.g.. Theorem 

11 inlETl). 

Claim 17 ([|2711. Let M be an Rx C sign matrix with VC dimension one so that no row appears 
twice in it, and every column c is shattered (i.e. the two values ±1 appear in it). Then, there is 
a column Cq G [C] and a row Tq G [i?] so that 7^ Mr,cofo^ till r ^ Tq in [i?]. 

Proof. For every column c, denote by oneSc the number of rows r G [i?] so that c = and 
let me = nmi{oneSc, R — oneSc}. Assume without loss of generality that mi < me for all c, 
and that mi = onesi. Since all columns are shattered, mi > 1. To prove the claim, it suffices 
to show that mi < 1 . 

Assume towards a contradiction that mi > 2. For b G {1,-1}, denote by the subma¬ 
trix of M consisting of all rows r so that Mry = b. The matrix has at least two rows. Since 
all rows are different, there is a column c 7 ^ 1 so that two rows in differ in c. Specifically, 
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column c is shattered in . Sinee VCdim(M) = 1, it follows that c is not shattered in , 
whieh means that the value in column c is the same for all rows of the matrix Therefore, 

rric < rrii, whieh is a eontradiction. □ 

The embedding we eonstruet has an extra strueture whieh allows the induetion to go through: 
The rows are mapped to points on the unit eirele (i.e. set of points x G so that ||x|| = 1). 

Lemma 18. Let M be an R x C sign matrix of VC dimension one so that no row appears twice 
in it. Then, M can be embedded in using half spaces, where each row is mapped to a point 
on the unit circle. 

The lemma immediately implies Threorem |2] due to the eonneetion to sign rank diseussed 
above. 

Proof. The proof follows by induetion on C.\iC = 1, the elaim trivially holds. 

The induetive step: If there is a eolumn that is not shattered, then we ean remove it, apply 
induetion, and then add a half spaee that either eontains or does not eontain all points, as nee- 
essary. So, we can assume all columns are shattered. By Claim [TVl we can assume without loss 
of generality that Mi i = 1 but M^^i = — 1 for all r 7 ^ 1 . 

Denote by tq the row of M so that = Mi c for all c 7 ^ 1, if sueh a row exists. Let M' 
be the matrix obtained from M by deleting the first eolumn, and row tq if it exists, so that no 
row in M' appears twiee. By induction, there is an appropriate embedding of M' in R^. 

The following is illustrated in Figure 1. Let X G R^ be the point on the unit eirele to whieh 
the first row in M' was mapped to (this row eorresponds to the first row of M as well). The half 
spaees in the embedding of M' are defined by lines, whieh mark the borders of the half spaees. 
The unit eirele interseets these lines in finitely many points. Let y,z he the two elosest points 
to X among all these interseetion points. Let y' be the point on the eirele in the middle between 
X, y, and let z' be the point on the eirele in the middle between x, z. Add to the configuration 
one more half spaee which is defined by the line passing through y', z'. If in addition row tq 
exists, then map tq to the point xq on the eirele whieh is right in the middle between y, y'. 

This is the eonstruction. Its correctness follows by induction, by the choiee of the last added 
half spaee whieh separates x from all other points, and sinee if Xq exists it belongs to the same 
cell as X in the embedding of M'. □ 

We eonelude the seetion by showing that the bound 3 above eannot be improved. 

Proof of Claim\^ One may deduee the elaim from Forster’s argument, but we provide a more 
elementary argument. It suffiees to eonsider the ease = 4. Consider an arrangement of 
four half planes in R^. These four half planes partition R^ to eight eones with different sign 
signatures, as illustrated in Figure 2. Let M be the 8x4 sign matrix whose rows are these sign 
signatures. The rows of M form a distance preserving eycle (i.e. the distance along eycle is 
hamming distance) of length eight in the diserete cube of dimension fouS 

Finally, the signed identity matrix is not a submatrix of M. To see this, note that the four 
rows of the signed identity matrix have pairwise hamming distance two, but there are no sueh 
four points (not even three points) on this cyele of length eight. 

□ 

'°The graph with vertex set {±1}^ where every two vectors of hamming distance one are connected by an edge. 
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Figure 2: An example of a neighbourhood of x. All other points in embedding of M' are to left 
of y and right of z on the eirele. The half spaee defined by the line through y\ z' is eoloured 
light gray. 



Figure 3: Four lines defining four half planes, and the eorresponding eight sign signatures. 

3.2.2 The upper bound 

In this subseetion we prove Theorem [51 The proof is short, but requires several ingredients. 
The first one has been mentioned already, and appears in ||51. For a sign matrix S', let S'C(S') 
denote the maximum number of sign ehanges (SC) along a eolumn of S. Define SC*{S) = 
min SC (M) where the minimum is taken over all matriees M obtained from S by a permutation 
of the rows. 

Lemma 19 ([|51|). For any sign matrix S, sign-rank{S) < SC*{S) + 1. 

Of eourse we ean replaee here rows by eolumns, but for our purpose the above version will 
do. The seeond result we need is a theorem of ll6^ (see also ll23ll ). As observed, for example, 
in ||49l , plugging in its proof a result of ll36l improves it by a logarithmie faetor, yielding the 
result we deseribe next. For a funetion g mapping positive integers to positive integers, we say 
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that a sign matrix S satisfies a primal shatter funetion g if for any integer t and any set I of 
m columns of S, the number of distinct projections of the rows of S' on J is at most g{t). The 
result of Welzl (after its optimization following [|36l) can be stated as follow^ 

Lemma 20 ( [[68l . see also [I^l49l l. Let S be a sign matrix with N rows that satisfies the primal 
shatter function g{t) = for some constants c > 0 and d>l. Then SC*{S) < 

Proof of Theorem\^ Let S be an x iV sign matrix of VC dimension d > 1. By Sauer’s 
lemma [[601, it satisfies the primal shatter function g{f) = Hence, by Lemma l20l SC*{S) < 
Therefore, by Lemma [T9l sign-rank(S') < □ 


On the tightness of the argument. The proof of Theorem [5] works, with essentially no 
change, for a larger class of sign matrices than the ones with VC dimension d. Indeed, the 
proof shows that the sign rank of any N x N matrix with primal shatter function at most 
for some fixed c and d > 1 is at most In this statement the estimate is sharp for all 

integers d, up to a logarithmic factor. This follows from the construction in which supplies 
N X N boolean matrices so that the number of 1 entries in them is at least and they 

contain no dhy D = (d — 1)! + 1 submatrices of I’s. These matrices satisfy the primal shatter 
function g{t) = D (*) + (!) (with room to spare). Indeed, if we have more than that many 

distinct projections on a set of t columns, we can omit all projections of weight at most d — 1. 
Each additional projection contains I’s in at least one set of size d, and the same d-set cannot be 
covered more than D times. Plugging this matrix in the counting argument that gives a lower 
bound for the sign rank using Lemma [22] proven below supplies an / logiV) lower 

bound for the sign rank of many N x N matrices with primal shatter function 0{t^). 

We have seen in Lemma [T9l that sign rank is at most of order SC*. Moreover, for a fixed r, 
many of the N x N sign matrices with sign rank at most r also have SC* at most r: Indeed, a 
simple counting argument shows that the number of iV x iV sign matrices M with SC{M) < r 
is 



2^^{rN log TV) 


so, the set of x iV sign matrices with SC*{M) < r is a subset of size of all A^ x A^ 

sign matrices with sign rank at most r. 

How many N x N matrices of sign rank at most r are there? by Lemma [22] proved in the 
next section, this number is at most So, the set of matrices with SC* < r is a rather 

large subset of the set of matrices with sign rank at most r. 

It is reasonable, therefore, to wonder whether an inequality in the other direction holds. 
Namely, whether all matrices of sign rank r have SC* order of r. We now describe an example 
which shows that this is far from being true, and also demonstrates the tightness of Lemma [201 
Namely, for every constant d> 1, there are N x N matrices S, which satisfy the primal shatter 
function g{t) = cf^ for a constant c, and on the other hand SC*{S) > Consider the 

grid of points P = [n]^ as a subset of Mf. Denote by ei,..., the standard unit vectors in 'Mf. 


"The statement in 1^ and the subsequent papers is formulated in terms of somewhat different notions, but it 
is not difficult to check that it is equivalent to the statement below. 
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For i e [n — 1] and j E [d], define the hyperplane hij = {x : {x, Cj) > i + (1/2)}. Denote by 
H the set of these d{n — 1) axis parallel hyperplanes. Let S be the P x H sign matrix defined 
by P and H. That is, Sp^h = 1 iff p E h. First, the matrix S satisfies the primal shatter function 
ct^, since every family of t hyperplanes partition to at most ct^ cells. Second, we show that 


SC*{S) > 


— 1 
d{n — 1) 


> 


p\l—l/d 

d 


Indeed, fix some order on the rows of S, that is, order the points P = {pi ,... ,pAr} with 
N = \P\. The key point is that one of the hyperplanes ho E H is so that the number of 
i E [iV — 1] for which Sp^^ho 7^ Sp^_^_-^^ho is at least {n^ — l)/{d{n — 1)): For each i there is at 
least one hyperplane h that separates Pi and Pi+i, that is, for which Sp^^h 7^ Sp^^^p The number 
of such pairs of points is — 1, and the number of hyperplanes is just (i(n — 1). 


3.2.3 The lower bound 

In this subsection we prove Theorem 01 Our approach follows the one of (Si, which is based 
on known bounds for the number of sign patterns of real polynomials. A similar approach has 
been subsequently used by [[T4ll to derive lower bounds for f{N^ d) for d> A, but here we do it 
in a slightly more sophisticated way and get better bounds. 

Although we can use the estimate in [15]| for the number of sign matrices with a given sign 
rank, we prefer to describe the argument by directly applying a result of [l67l . described next. 

Let P = (Pi, P 2 ,. .., Pm) be a list of m real polynomials, each in ^ variables. Define the 
semi-variety 

V = V (P) = {x e : Pi(x) 7 ^ 0 for all 1 < i < m}. 

For X E V, the sign pattern of P at x is the vector 

{sign{Pi{x)), sign{P 2 {x)),sign{Pm{x))) E {-1,1}™. 

Let s(P) be the total number of sign patterns of P as x ranges over all of V. This number is 
bounded from above by the number of connected components of V. 

Theorem 21 HlbTlH . Let P = (Pi, P 2 ,. .., Pm) be a list of real polynomials, each in t variables 
and of degree at most k. If m > i then the number of connected components ofV{P) (and 
hence also s(P)) is at most (Aekm/i)^. 

An N X N matrix M is of rank at most r iff it can be written as a product M = Mi • M2 of an 
N xr matrix Mi by an r x matrix M2. Therefore, each entry of M is a quadratic polynomial 
in the 2Nr variables describing the entries of Mi and M 2 . We thus deduce the following from 
Warren’s Theorem stated above. A similar argument has been used by [fTSlI . 

Lemma 22. Let r < N/2. Then, the number of N x N sign matrices of sign rank at most r 
does not exceed (0(N/r))'^^’^ < ^ 

For a fixed r, this bound for the logarithm of the above quantity is tight up to a constant 
factor: As argued in Subsection l3.2.2l there are at least some matrices of sign rank 

r. 
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In order to derive the statement of Theorem |4] from the last lemma it suffices to show that 
the number of iV x iV sign matrices of VC dimension d is sufficiently large. We proceed to do 
so. It is more convenient to discuss boolean matrices in what follows (instead of their signed 
versions). 

Proof of Theorem^ There are 4 parts as follows. 

1. The case d = 2: Consider the N x N incidence matrix A of the projective plane with 

N points and N lines, considered in the previous sections. The number of 1 entries in A is 
(1 + and it does not contain J 2 X 2 (the 2 x 2 all 1 matrix) as a submatrix, since there 

is only one line passing through any two given points. Therefore, any matrix obtained from it 
by replacing ones by zeros has VC dimension at most 2, since every matrix of VC dimension 3 
must contain J 2 X 2 as a submatrix. This gives us distinct N x N sign matrices of 

VC dimension at most 2. Lemma [22l therefore establishes the assertion of TheoremlH part 1. 

2. The case d = 3: Call a 5 x 4 binary matrix heavy if its rows are the all 1 row and the 4 
rows with Hamming weight 3. Call a 5 x 4 boolean matrix heavy-dominating if there is a heavy 
matrix which is smaller or equal to it in every entry. 

We claim that there is a boolean N x N matrix B so that the number of 1 entries in it 
is at least and it does not contain any heavy-dominating 5x4 submatrix. Given 

such a matrix B, any matrix obtained from B by replacing some of the ones by zeros have VC 
dimension at most 3. This implies part 2 of TheoremlH using Lemma [22] as before. 

The existence of B is proved by a probabilistic argument. Let C be a random binary matrix 
in which each entry, randomly and independently, is 1 with probability p = Let X be 

the random variable counting the number of 1 entries of C minus twice the number of 5 x 4 
heavy-dominant submatrices C contains. By linearity of expectation, 

E(X) > VV - 2Ar4+5pl'4+4.3 _ (2(JV23/15). 

Fix a matrix C for which the value of X is at least its expectation. Replace at most two 1 entries 
by 0 in each heavy-dominant 5x4 submatrix in C to get the required matrix B. 

3. The case d = A: The basic idea is as before, but here there is an explicit construction that 
beats the probabilistic one. Indeed, [|2T]| constructed an iV x iV boolean matrix B so that the 
number of 1 entries in B is at least f2(iV^/^) and it does not contain J 3 X 3 as a submatrix (see 
also S for another construction). No set of 5 rows in every matrix obtained from this one by 
replacing I’s by O’s can be shattered, implying the desired result as before. 

4. The case d > A: The proof here is similar to the one in part 2. We prove by a probabilistic 
argument that there is an x binary matrix B so that the number of 1 entries in it is at least 

H( —(rf^-t-5rf+2)/(a^+2rf^+3a) ^ 

and it contains no heavy-dominant submatrix. Here, heavy-dominant means a 1 -f (d-l-1) -f 
by d+1 matrix that is bigger or equal in each entry than the matrix whose rows are all the distinct 
vectors of length d+1 and Hamming weight at least d — 1. Any matrix obtained by replacing 
I’s by O’s in B cannot have VC dimension exceeding d. The result follows, again, from Lemma 
|22l 
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We start as before with a random matrix C in whieh eaeh entry, randomly and independently, 
is ehosen to be 1 with probability 

2-l-(d+l)-^'^+l)-(d+l) 

^ ^ l-(d+l) + (d+l)-d+^ '2 1) —1 __ 

^ “ 2 ~ 2iV('^^+5<^+2)/('^®+2'^^+3'^) ■ 

Let X be the random variable eounting the number of 1 entries of C minus three times the 
number of heavy-dominant submatrices C contains. As before, E(X) > and by 

deleting some of the I’s in (7 we get B. □ 

3.3 Sign rank and spectral gaps 

The lower bound on the sign rank uses Forster’s argument [l30ll . who showed how to relate sign 
rank to spectral norm. He proved that if S' is an X x N sign matrix then 

N 

sign-rank(S) > 

We would like to apply Forster’s theorem to the matrix S in our explicit examples. The spectral 
norm of S, however, is too large to be useful: If S is A < N/3 regular and x is the all 1 vector 
then Sx = (2A — N)x and so ||S|| > N/3. Applying Forster’s theorem to S yields that its sign 
rank is H(l), which is not informative. 

Our solution is based on the observation that Forster’s argument actually proves a stronger 
statement. His proof works as long as the entries of the matrix are not too close to zero, as was 
already noticed in [l3T]| . We therefore use a variant of the spectral norm of a sign matrix S which 
we call star norm and denote b>0 

IIS'!!* = min{||M|| : MijSij > 1 for alH, j}. 

Three comments seem in place, (i) We do not think of the star norm as a norm, (ii) It is 
always at most the spectral norm, US'!!* < US'!!, (iii) Every M in the above minimum satisfies 
sign-rank(M) = sign-rank (S'). 

Theorem 23 (1(3111. Let S be an N X N sign matrix. Then, 

N 

sign-rank{S) > p^. 

For completeness, in Section [3.3.2l we provide a short proof of this theorem (which uses the 
main lemma from 1(301 as a black box). To get any improvement using this theorem, we must 
have IIS'!!* <C US'!!. It is not a priori obvious that there is a matrix S for which this holds. The 
following lemma shows that spectral gaps yield such examples. 


'2 The minimizer belongs to a closed subset of the bounded set {M : ||M|| < H^H}. 


18 









Theorem 24. Let S be a A regular N x N sign matrix with A < N/2, and B its boolean 
version. Then, 

iisir < 

In other words, every regular sign matrix whose boolean version has a spectral gap has a 
small star norm. Theorem |2^ and Theorem l24l immediately imply Theorem 0 In Section 
we provided concrete examples of matrices with a spectral gap, that have applications in com¬ 
munication complexity, learning theory and geometry. 

Proof of Theorem \2^ Define the matrix 


M 


N 

A 


B-J. 


Observe that since N > 2A it follows that MijSij > 1 for all i,j. So, 


ii^ir<iiM||. 


Since B is regular, the all 1 vector y is a right singular vector of B with singular value A. 
Specifically, My = 0. For every x, write x = xi + X 2 where xi is the projection of x on ?/ and 
X 2 is orthogonal to y. Thus, 

{Mx,Mx) = {Mx 2 ,Mx 2 ) = —{Bx 2 ,Bx 2 ). 

A^ 

Note that ||5|| < A (and hence ||i?|| = A). Indeed, since B is regular, there are A permutation 
matrices B^^\ ..., B^^'> so that B is their sum. The spectral norm of each B^^'> is one. The 
desired bound follows by the triangle inequality. 

Finally, since X 2 is orthogonal to y, 


Bx2\\ < (J2{B) ■ IIX2II < (T2{B) ■ \\x 


So, 


||M|| < 


N ■a2{B) 

A 


□ 


3.3.1 Limitations 

It is interesting to understand whether the approach above can give a better lower bound on sign 
rank. There are two parts to the argument: Forster’s argument, and the upper bound on 11511*. 
We can try to separately improve each of the two parts. 

Any improvement over Forster’s argument would be very interesting, but as mentioned there 
is no significant improvement over it even without the restriction induced by VC dimension, so 
we do not discuss it further. 

To improve the second part, we would like to find examples with the biggest spectral gap 
possible. The Alon-Boppana theorem [15^ optimally describes limitations on spectral gaps. The 
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second eigenvalue a of a A regular graph is not too small, 


a > 2 VA- 1 - o(l), 


where the o(l) term vanishes when N tends to infinity (a similar statement holds when the 
diameter is large IfSBl f. Specifically, the best lower bound on sign rank this approach can yield 
is roughly \/A/2, at least when A < 

But what about general lower bounds on || S'!! *? It is well known that any N x N sign matrix 
S satisfies 11511 > \/N. We prove a generalization of this statement. 


Lemma 25. Let S be an N x N sign matrix. For i G [N], let 7 * be the minimum between the 
number ofl’s and the number of —Vs in the i’th row. Let 7 = 7 ( 5 ) = max{ 7 j : i G [N]}. 
Then, 

N--f 


ii5ir > 


v^ + i' 


This lemma provides limitations on the bound from Theorem l24l Indeed, 7(5) < ^ and 
is a monotone decreasing function of 7 , which implies ||5||* > f2(\/iV). Interestingly, 
Lemma [25] and Theorem |24| provide a quantitively weaker but a more general statement than 
the Alon-Boppana theorem: If 5 is a A regular N x N boolean matrix with A < A/2, then 


N ■a2{B) ^ N-A 
A - VA + 1 


a.(B)>(l-^) (VA-l), 


This bound is off by roughly a factor of two when the diameter of the graph is large. When the 
diameter is small, like in the case of the projective plane which we discuss in more detail below, 
this bound is actually almost tight: The second largest singular value of the boolean point-line 
incidence matrix of a projective plane of order n is y/n while this matrix is n + 1 regular (c.f., 
e.g., US!). 

It is perhaps worth noting that in fact here there is a simple argument that gives a slightly 
stronger result for boolean regular matrices. The sum of squares of the singular values of B 
is the trace of B^B, which is NA. As the spectral norm is A, the sum of squares of the other 
singular values is NA — A^ = A{N — A), implying that 


0-2(5) > 


A(A- A) 
N-1 




which is (slightly) larger than the bound above. 

Proof of Lemma 1^ Let M be a matrix so that ||M|| = ||5||* and MijSij > 1 for all i,j. 
Assume without loss of generalit)|il that % is the number of —I’s in the Tth row of 5. If 7 = 0, 
then 5 has only positive entries which implies ||M|| > N as claimed. So, we may assume 
7 > 1. Let t be the largest real so that 

2 (A - 7 - f)2 

^ ( 1 ) 

7 

'^Multiplying a row by —1 does not affect 11511*. 
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That is, if 7 = 1 then t = and if 7 > 1 then 


-{N - 7 ) + y'( 7 V- 7)2 + ( 7 -l)(iV- 7)2 


In both cases, 


iV-7 
x/7 + 1' 


We shall prove that 




There are two cases to consider, 
if X is the all 1 vector then 


One is that for all i G [N] we have ^i,j — 


l|M|| > 



> t. 


In this case. 


The second case is that there is z G [N] so that ^i,j < Assume without loss of generality 
that i = 1. Denote by C the subset of the columns j so that Mi ^ < 0. Thus, 


iec j^c 

>\[N]\C\-t 
> N — ^ — t. 


i\Mij\ > 1 foralH, j) 
(l^^l <7) 


Convexity ofx^x'^ implies that 




Kjec 


jec 


so by ([U) 


Ml, > *^~^~**" = e. 


7 


In this case, if x is the vector with 1 in the first entry and 0 in all other entries then 


||(M)^a:|| = ^ ^ = ^11^1 


Since ||(M)^|| = ||M||, it follows that ||M|| > t. 


□ 


3.3.2 Forster’s theorem 

Here we provide a proof of Forster’s theorem, that is based on the following key lemma, which 
he proved. 
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Lemma 26 ([EOll). Let X <zMl^ be a finite set in general position, i.e., every k vectors in it are 
linearly independent. Then, there exists an invertible matrix B so that 


where I is the identity matrix, and Bx®Bx is the rank one matrix with {i,j) entry {Bx)i{Bx)j. 

The lemma shows that every X in general position can be linearly mapped to BX that is, 
in some sense, equidistributed. In a nutshell, the proof of the lemma is by finding Bi, B 2 ,... so 
that each B^ makes Bi_iX closer to being equidistributed, and finally using that the underlying 
object is compact, so that this process reaches its goal. 

Proof of Theorem \23\ Let M be a matrix so that ||M|| = \\S\\* and MijSij > 1 for all i,j. 
Clearly, sign-rank(S') = sign-rank(M). Let X, Y be two subsets of size N of unit vectors in 
with k = sign-rank(M) so that {x, y)Mx,y > 0 for all x, y. Lemma[26]says that we can assume 

N 

x®x = —I] (2) 

k 

If necessary replace X by BX and Y by {B'^)~^Y, and then normali z e (the assumption required 
in the lemma that X is in general position may be obtained by a slight perturbation of its 
vectors). 

The proof continues by bounding D = Yl,x&x y& ^x,y{x, y) in two different ways. 

First, bound D from above: Observe that for every two vectors u, v, Cauchy-Schwartz inequal¬ 
ity implies 



{Mu,v) < IIMmIIIIuII < ||M||||m||||u 


(3) 


Thus, 


D ^ ^ ^ ^ ^ ^ Mx.yXiyi 
i=l x^X y^Y 


V yGY 


i=l 


< l|M||, 




X, 


i=l xGX 


\ EE»? = 11^^11^- 

\ i=l y&Y 


(®) 


(Cauchy-Schwartz) 


Second, bound D from below: Since {M^^yl > 1 and \ {x,y)\ < 1 for all x, y, using ®, 

d = ^x,y{x, y)>J2 ( 2 ^ ® ^)y) = = 

xex y&Y x&X yeY y&Y x&X yeY 
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3.4 Applications 

3.4.1 Explicit examples 

Here we prove Theorem |7] and Theorem [8l 

Proof of Theorem^ It is well known that the VC dimension of A is d, but we provide a brief 
explanation. The VC dimension is at least d by eonsidering any set of d independent points (i.e. 
so that no striet subset of it spans it). The VC dimension is at most d sinee every set of d + 1 
points is dependent in a d dimensional spaee. 

The lower bound on the sign rank follows immediately from Theorem and the following 
known bound on the speetral gap of these matriees. 

Lemma 27. If B is the boolean version of A then 

A n<^-l - ■ 


The proof is so short that we inelude it here. 

Proof We use the following two known properties (see, e.g., ifT^ i of projeetive spaees. Both 
the number of distinet hyperplanes through a point and the number of distinet points on a hy¬ 
perplane are The number of hyperplanes through two distinet points is Nn,d- 2 - 

The first property implies that A is A = Nn,d-i regular. These properties also imply 

BB^ = {N^^d-i - iVn,a-2) / + N^,d-2J = J, 

where J is the all 1 matrix. Therefore, all singular values exeept the maximum one are . □ 

□ 


Proof of Theorem^ We first show that R is indeed a maximum elass of VC dimension 2. The 
VC dimension of d? is 2: It is at least 2 beeause R eontains the set of lines whose VC dimension 
is 2. It is at most 2 beeause no three points Pi,P 2 ,P 3 are shattered. Indeed if they all belong to 
a line i then without loss of generality aeeording to the order of i we have Pi < P 2 < Ps whieh 
implies that the pattern 101 is missing. Otherwise, they are not eo-linear and the pattern 111 is 
missing. 

To see that id is a maximum elass, note that there are exaetly iV + 1 intervals of size at 
most one (one empty interval and N singletons). For eaeh line i ^ L, the number of intervals 
of size at least two whieh are subsets of i is exaetly (' 2 ') = (” 2 ^)- Sinee every two distinet 
lines interseet in exaetly one point, it follows that eaeh interval of size at least two is a subset of 
exaetly one line. It follows that the number of intervals is 


1 + iV + iV- 


n + 1 
2 


1 + N + 
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Thus, R is indeed a maximum class of VC dimension 2. 

Next we show that there exists a choice of a linear order for each line such that the resulting 
R has sign rank f2(iV 2 / log N). By the proof of TheoremSl case d = 2, there is a choice of a 
subset for each line such that the resulting N subsets form a class of sign rank f 2 (iV 2 / logiV). 
We can therefore pick the linear orders in such a way that each of these N subsets forms an 
interval, and the resulting maximum class (of all possible intervals with respect to these orders) 
has sign rank at least as large as f 2 (iV 2 / log A^). □ 

3.4.2 Computing the sign rank 

In this section we describe an efficient algorithm that approximates the sign rank (Theorem |9l). 

The algorithm uses the following notion. Let L be a set. A pair v,u e V is crossed by a 
vector c e {±1}^ if c{v) 7 ^ c{u). Let T be a tree with vertex set V = [A^] and edge set E. Let 
S' be a L x [A^] sign matrix. The stabbing number of T in S is the largest number of edges in 
T that are crossed by the same column of S. For example, if T is a path then T defines a linear 
order (permutation) on V and the stabbing number is the largest number of sign changes among 
all columns with respect to this order. 

Welzl [[Ml gave an efficient algorithm for computing a path T with a low stabbing number 
for matrices S with VC dimension d. The analysis of the algorithm can be improved by a 
logarithmic factor using a result of [|36l. 

Theorem 28 ( lIMl EH). There exists a polynomial time algorithm such that given a V X m 
sign matrix S with \V\ = N, outputs a path on V with stabbing number at most 200A^^ 
where d = VC{S). 

For completeness, and since to the best of our knowledge no explicit proof of this theorem 
appears in print, we provide a description and analysis of the algorithm. We assume without 
loss of generality that the rows of S are pairwise distinct. 

We start by handling the case0 d = 1. In this case, we directly output a tree that is a path 
(i.e., a linear order on V). If d = 1, then Claim [T tI implies that there is a column with at most 
2 sign changes with respect to any order on V. The algorithm first finds by recursion a path T 
for the matrix obtained from S by removing this column, and outputs the same path T for the 
matrix S as well. By induction, the resulting path has stabbing number at most 2 (when there is 
a single column the stabbing number can be made 1). 

For d> 1, the algorithm constructs a sequence of N forests Fq, Fi,..., Fat-i over the same 
vertex set V. The forest Fj has exactly i edges, and is defined by greedily adding an edge e* to 
Fj_i. As we prove below, the tree F/v-i has a stabbing number at most 100A^^“^/'^. The tree 
F/v_i is transformed to a path T as follows. Let vi,V 2 ,, V 2 N -1 be an eulerian path in the 
graph obtained by doubling every edge in Fjv-i. This path traverses each edge of Fn-i exactly 
twice. Let S' be the matrix with 2N — 1 rows and N columns obtained from S be putting row 
Vi in S as row i, for i G [2A^ — 1]. The number of sign changes in each column in S' is at most 
2 ■ 100A^^“^/'^. Finally, let T be the path obtained from the eulerian path by leaving a single copy 
of each row of S. Since deleting rows from S' cannot increase the number of sign changes, the 
path T is as stated. 

''^This analysis also provides an alternative proof for Lemma [T^ 
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The edge e* is ehosen as follows. The algorithm maintains a probability distribution pi on 
[N]. The weight Wi{e) of the pair e = {v, u} is the probability mass of the eolumns e erosses, 
that is, Wi{e) = Pi{{j G [N] : Suj ^ The algorithm chooses e, as an edge with 

minimum Wj-weight among all edges that are not in Fi_i and do not close a cycle in Fi_i. 

The distributions pi,... ,pn are chosen iteratively as follows. The first distribution pi is the 
uniform distribution on [N]. The distribution is obtained from pi by doubling the relative 
mass of each column that is crossed by e*. That is, let Xi = Wi{ei), and for every column j that 
is crossed by e* define Pi+i{j) = and for every other column j define pi+i{j) = f^. 

This algorithm clearly produces a tree on V, and the running time is indeed polynomial in 
N. It remains to prove correctness. We claim that each column is crossed by at most 
edges in T. To see this, let j be a column in S, and let k be the number of edges crossing j. It 
follows that ^ ^ 

PnH) 2 ('x+ + a; 2 )... (1+ a;Ar_i)' 

To upper bound k, we use the following claim. 

Claim 29. For every i we have Xi < 4e^(iV — 

The claim completes the proof of Theorem [2^ Since PN{j) < 1 and d > 1, 

fc < logiV + log (1 + Xi) + ... + log (1 + xn-i) 

< log(A^) + 2 (ln(l + Xi) + ... + ln(l + Xj^^i)) (Vx : log(a:) < 2 ln(x)) 

< log(A^) + 2(a;i + ... + xn-i) 

< logiV + 


The claim follows from the following theorem of Haussler. 

Theorem 30 ( [|3^ 1. Let p be a probability distribution on [A^], and let e > 0. Let S G 
{±1}^^ [^1 be a sign matrix of VC dimension d so that the p-distance between every two distinct 
rows u, V is large: 

Pi{j e [A^] : ^ > e. 

Then, the number of distinct rows in S is at most 

e{d + 1) {flejef < (de^/e)"^. 

Proof of Claim\2^ Haussler’s theorem states that if the number of distinct rows is M, then 
there must be two distinct rows of pj-distance at most There are N — i connected 

components in Fj. Pick A^ — i rows, one from each component. Therefore, there are two of these 
rows whose distance is at most = 4e^(A^ — Now, observe that the Wj-weight 

of the pair {u,v} equals the pj-distance between u,v. Since e* is chosen to have minimum 
weight, Xi < Ae\N — □ 

We now describe the approximation algorithm. Let S' be an A^ y. N sign matrix of VC 
dimension d. Run WelzTs algorithm on S', and get a permutation of the rows of S' that yield a 
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low stabbing number. Let s be the maximum number of sign ehanges among all eolumns of S 
with respect to this permutation. Output s + 1 as the approximation to the sign rank of S. 

We now analyze the approximation ratio. By Lemma [T9] the sign rank of S is at most s + 1. 
Therefore, the approximation factor sjgJ,.mk(S) other hand, Proposition [T] 

implies that d < sign-rank (S'). Thus, by the guarantee of Welzl’s algorithm. 


s + 1 


sign-rank(S') 


< O 


jyl-l/d 


sign-rank(S) 


) 


< O 


jyl-l/dX 


This factor is maximized for d = ©(log N) and is therefore at most 0{N/ log N). 


3.4.3 Counting VC classes 

Here we prove Theorems fTTI and \V2\ It is convenient for both to set 



Proof of Theorem 1771 We start with the upper bound. Enumerate the members of each such 
class C as follows. Start with the (lexicographically) first member c & C, call it Ci. Assuming 
Cl, C2, ..., Cj have already been chosen, let Cj+i be the member c among the remaining vectors 
in C whose hamming distance from the set {ci,..., Cj} is minimum (in case of equalities we 
take the first one lexicographically). This gives an enumeration ci,..., Cm of the members of 
C, and m < f. 

We now upper bound the number of possible families. There are at most 2^ ways to choose 
Cl. If the distance of Cj+i from the previous sets is h = hj+i, then we can determine Cj+i by 
giving the index j < iso that the distance between Cj+i and Cj is h, and by giving the symmetric 
difference of Cj+i and Cj. There are less than m < f ways to choose the index, and at most 
())) < {eN/hY options for the symmetric difference. The crucial point is that by Theorem [30] 
the number of i for which hi > D is less than e{d + l)(2eA^/i9)'^. Hence the number of i for 
which hi is between 2^ and 2^+^ is at most e{d + l){2eN/2^Y. This upper bounds c{N, d) by at 
most 

i 

We now present a lower bound on the number of (maximum) classes with VC dimension 
d. Take a family F of ('^) /(d + 1) subsets of [N] of size {d + 1) so that every subset of size 
d is contained in exactly one of them. Such families exist by a recent breakthrough result of 
Keevash [3Q1, provided the trivial divisibility conditions hold and N > No{d). His proof also 
gives that there are d)/©+^) such families. 

Now, construct a class C by taking all subsets of cardinality at most d — 1, and for each 
(d + l)-subset in the family F take it and all its subsets of cardinality d besides one. The VC 
dimension of C is indeed d. The number of possible Cs that can be constructed this way is at 
least the number of families F. Therefore, the number of classes of VC dimension d is at least 
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the number of Fs: 


^{l+o(l))(^)/(d+l) ^ jy(n(7V/d))'i_ 


□ 

Proof of Theoremim For the upper bound we use the known faet that every maximum elass is 
a eonneeted subgraph of the boolean eube [1351. Thus, to upper bound the number of maximum 
elasses of VC dimension d it is enough to upper bound the number of eonneeted subgraphs of 
the A^-dimensional cube of size /. It is known (see, e.g.. Lemma 2.1 in (Si) that the number 
of connected subgraphs of size A; in a graph with m vertices and maximum degree D is at 
most m{eD)^. In our case, plugging k = f, m = 2^, D = N yields the desired bound 
2^{eNy = 

For the lower bound, note that in the proof of Theorem ITT] the constructed classes were of 
size /, and therefore maximum classes. Therefore, there are at least maxi¬ 
mum classes of VC dimension d. □ 

3.4.4 Counting graphs 

Proof of Theorem\T3\ The key observation is that whenever we split the vertices of a [/(d + 1)- 
free graph into two disjoint sets of equal size, the bipartite graph between them defines a matrix 
of VC dimension at most d. Hence, the number of such bipartite graphs is at most 

By a known lemma of Shearer |[2^ . this implies that the total number of U{d + l)-free graphs 
on N vertices is less than T{N, d)^ = 2^^^^ ^ log^) por completeness, we include the simple 
details. The lemma we use is the following. 

Lemma 31 ( [l24l l. Let F be a family of vectors in Si x S 2 ■■■ x Sn- Let Q = {Gi,..., Gm} 
be a collection of subsets of [n], and suppose that each element i G [n] belongs to at least k 
members ofQ. For each 1 < i < m, let Fi be the set of all projections of the members of F on 
the coordinates in Gi. Then 

m 

i=l 

In our application, n = (^) and S'! = ... = S'„ = {0,1}. The vectors represent graphs on 
N vertices, each vector being the characteristic vector of a graph on N labeled vertices. The set 
[n] corresponds to the set of all (^) potential edges. The family F represents all U{d+ l)-free 
graphs. The collection Q is the set of all complete bipartite graphs with N/2 vertices in each 
color class. Each edge i G [n] belongs to at least (in fact a bit more than) half of them, i.e., 
k > m/2. Hence, 

/ m \ 2/m 

as desired. □ 
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4 Concluding remarks and open problems 

We have given explieit examples of N x N sign matriees with small VC dimension and large 
sign rank. However, we have not been able to prove that any of them has sign rank exeeeding 
Indeed this seems to be the limit of Forster’s approaeh, even if we do not bound the VC 
dimension. Forster’s theorem shows that the sign rank of any N x N Hadamard matrix is at 
least It is easy to see that there are Hadamard matriees of sign rank signifieantly smaller 
than linear in N. Indeed, the sign rank of the 4x4 signed identity matrix is 3, and henee the 
sign rank of its fc’th tensor power, whieh is an x Hadamard matrix with N = 4^, is at most 
3^ = A^*°g3/iog4 similar argument was given by |[^ for the Sylvester-Hadamard matrix). It 
may well be, however, that some Hadamard matriees have sign rank linear in N, as do random 
sign matriees, and it will be very interesting to show that this is the ease for some sueh matriees. 
It will also be interesting to deeide what is the eorreet behavior of the sign rank of the ineidenee 
graph of the points and lines of a projeetive plane with N points. We have seen that it is at least 
and at most 

Using our speetral teehnique we ean give many additional explieit examples of matriees 
with high sign rank, ineluding ones for whieh the matriees not only have VC dimension 2, but 
are more restrieted than that (for example, no 3 eolumns have more than 6 distinet projeetions). 

We have shown that the maximum sign rank f{N, d)ofanNxN matrix with VC dimension 
d > 1 is at most and that this is tight up to a logarithmie faetor for d = 2, and elose 

to being tight for large d. It seems plausible to eonjeeture that f{N, d) = for all 

d > 1. 

We have also showed how to use this upper bound to get a nontrivial approximation algo¬ 
rithm for the sign rank. It will be interesting to fully understand the eomputational eomplexity 
of eomputing the sign rank. 

Finally we note that most of the analysis in this paper ean be extended to deal with M x N 
matriees, where M and N are not neeessarily equal, and we restrieted the attention here for 
square matriees mainly in order to simplify the presentation. 
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